02-2: 2nd Order Spatial Point Patterns Analysis Methods

Author

Yiqiong PAN

Published

September 4, 2025

Modified

September 5, 2025

execute: echo: true #display the code eval: true message: false warning: false freeze: false # true: not render if nothing edited editor: visual —

1 Overview

This exercise uses second-order spatial point pattern analysis and spatstat package to study childcare centres in Singapore. Unlike first-order analysis, which looks only at overall density independently, second-order methods reveal relationships and influences between centres (points) based on distance.

We aim to answer two key questions:

  1. Are childcare centres randomly distributed across Singapore?

  2. If not, where are the areas with higher concentrations?

2 The data

As also used in 1st order SPPA, the following datasets were downloaded from publicly available websites, and both are available in KML and GeoJSON format.

Dataset Name Source Discrption
Child Care Services data.gov.sg Point feature data: contains the locations and attributes of childcare centres.
Master plan 2019 Subzone Boundary (No Sea) singstat Polygon feature data: represents the URA 2019 Master Plan planning subzone boundaries.

3 Installing and Loading the R packages

In addition to spatstat, a total of five R packages will be used in this exercise.

Package Discription
sf Simple Features, a new R package which handles importing, managing, and processing vector-based geospatial data.
spatstat Provides useful functions for SPPA, which will be called to conduct both 1st and 2nd SPPA and KDE.
tmap Creates high quality static or interactive choropleth maps via leaflet.
rvest Scrapes and extracts data from web pages.

After installation, we load them into R environment using the code below.

pacman:: p_load(sf, spatstat, tmap, rvest, tidyverse)

4 Data Import and Preparation

4.1 importing data

The following code chunk shows the steps to first import the Master plan 2019 Subzone Boundary (No Sea) data using st_read, extract the required 4 columns from the Description field, filter out the nearby islands, and finally save the file as mpsz_cl for further analysis.

mpsz_sf <- st_read("data/MasterPlan2019SubzoneBoundaryNoSeaKML.kml") %>%
  st_zm(drop = TRUE, what = "ZM") %>%
  st_transform(crs = 3414)
Reading layer `URA_MP19_SUBZONE_NO_SEA_PL' from data source 
  `C:\lsrgc\ISSS626-yiqiong-pan\Hands-on_Ex\Hands-on_Ex02\data\MasterPlan2019SubzoneBoundaryNoSeaKML.kml' 
  using driver `KML'
Simple feature collection with 332 features and 2 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 103.6057 ymin: 1.158699 xmax: 104.0885 ymax: 1.470775
Geodetic CRS:  WGS 84
extract_kml_field <- function(html_text, field_name) {
  if (is.na(html_text) || html_text == "") return(NA_character_)
  
  page <- read_html(html_text)
  rows <- page %>% html_elements("tr")
  
  value <- rows %>%
    keep(~ html_text2(html_element(.x, "th")) == field_name) %>%
    html_element("td") %>%
    html_text2()
  
  if (length(value) == 0) NA_character_ else value
}
# map_chr of purr (tidyverse) applies a function to each element of a list/vector and returns a character vector.
mpsz_sf <- mpsz_sf %>%
  mutate(
    REGION_N = map_chr(Description, extract_kml_field, "REGION_N"),
    PLN_AREA_N = map_chr(Description, extract_kml_field, "PLN_AREA_N"),
    SUBZONE_N = map_chr(Description, extract_kml_field, "SUBZONE_N"),
    SUBZONE_C = map_chr(Description, extract_kml_field, "SUBZONE_C")
  ) %>%
  select(-Name, -Description) %>%
  relocate(geometry, .after = last_col())
mpsz_cl <- mpsz_sf %>%
  filter(SUBZONE_N != "SOUTHERN GROUP",
         PLN_AREA_N != "WESTERN ISLANDS",
         PLN_AREA_N != "NORTH-EASTERN ISLANDS")
write_rds(mpsz_cl,
          "data/mpsz_cl.rds")

The code chuck below imports downloaded ChildCareServices data to R as sf data frame as childcare_sf by using st_read, coverts 3d to 2d (st_zm) and finally transform the CRS from WGS84 to SVY21.

childcare_sf <- st_read("data/ChildCareServices.geojson") %>%
  st_zm(drop = TRUE, what = "ZM") %>% # Drop Z and M to convert from multi-dimensional to 2d (XY)
  st_transform(crs = 3414)
Reading layer `ChildCareServices' from data source 
  `C:\lsrgc\ISSS626-yiqiong-pan\Hands-on_Ex\Hands-on_Ex02\data\ChildCareServices.geojson' 
  using driver `GeoJSON'
Simple feature collection with 1925 features and 2 fields
Geometry type: POINT
Dimension:     XYZ
Bounding box:  xmin: 103.6878 ymin: 1.247759 xmax: 103.9897 ymax: 1.462134
z_range:       zmin: 0 zmax: 0
Geodetic CRS:  WGS 84

Using the tmap mapping methods, the code chunk below creates a map combining childcare_sf and mpsz_cl.

tm_shape(mpsz_cl)+
  tm_polygons() +
  tm_shape(childcare_sf) +
  tm_dots(size = 0.3)

Alternatively, an interactive thematic map can be plotted using the code below. The interactive map is easy to navigate and query intuitively. It is optional to change the background map layer(choices: ESRI.WorldGrayCanvas(default), OpenStreetMap, ESRI.WorldTopoMap).

tmap_mode('view')
tm_shape(childcare_sf) +
  tm_dots() #creates a layer of dots to visualise point data on a map.
tmap_mode('plot') #switch back static maps
Warning

It is advised to always switch back to plot mode to save connection consumption and limit the number of interactive maps to 10 in one documents when publishing.

4.2 Geospatial Data Wrangling

In this section, the data (sf objects) will be converted to spatstat data structure: ppp (for point data) and owin for observation window.

Here we use as.ppp() of spatstat package to covert the point data childcare_sf to ppp file, confirm the change using class() and have a quick overview of the data statistics via summary().

childcare_ppp <- as.ppp(childcare_sf)
class(childcare_ppp)
[1] "ppp"
summary(childcare_ppp)
Marked planar point pattern:  1925 points
Average intensity 2.417323e-06 points per square unit

Coordinates are given to 11 decimal places

Mark variables: Name, Description
Summary:
     Name           Description       
 Length:1925        Length:1925       
 Class :character   Class :character  
 Mode  :character   Mode  :character  

Window: rectangle = [11810.03, 45404.24] x [25596.33, 49300.88] units
                    (33590 x 23700 units)
Window area = 796335000 square units
plot(unmark(childcare_ppp), main = "childcare_ppp") #drops the marks, since simple plot() is not displaying properly showing broken tags <th>, <td>, <table> etc

Before moving forwards, let’s check if there are any duplicated points.

any(duplicated(childcare_ppp))
[1] FALSE

Similarly, the owin object can be created using the function as.owin() for polygon data. After the conversion, the class() and plot() functions can be used to verify that the object is of the correct class and that the data retains its original shape.

sg_owin <- as.owin(mpsz_cl)
class(sg_owin)
[1] "owin"
plot(sg_owin)

The code chunk below combines ppp and owin into one ppp file which means it updates the window of childcare_ppp to sg_owin and keeps the points that fall inside.

childcareSG_PPP = childcare_ppp[sg_owin]
class(childcareSG_PPP)
[1] "ppp"
childcareSG_PPP
Marked planar point pattern: 1925 points
Mark variables: Name, Description 
window: polygonal boundary
enclosing rectangle: [2667.54, 55941.94] x [21448.47, 50256.33] units
summary(childcareSG_PPP)
Marked planar point pattern:  1925 points
Average intensity 2.875208e-06 points per square unit

Coordinates are given to 11 decimal places

Mark variables: Name, Description
Summary:
     Name           Description       
 Length:1925        Length:1925       
 Class :character   Class :character  
 Mode  :character   Mode  :character  

Window: polygonal boundary
41 separate polygons (26 holes)
                  vertices         area relative.area
polygon 1              285  1.61128e+06      2.41e-03
polygon 2               27  1.50315e+04      2.25e-05
polygon 3 (hole)        41 -4.01660e+04     -6.00e-05
polygon 4 (hole)       317 -5.11280e+04     -7.64e-05
polygon 5 (hole)         3 -4.14100e-04     -6.19e-13
polygon 6               30  2.80002e+04      4.18e-05
polygon 7 (hole)         4 -2.86396e-01     -4.28e-10
polygon 8 (hole)         3 -1.81439e-04     -2.71e-13
polygon 9 (hole)         3 -5.99531e-04     -8.95e-13
polygon 10 (hole)        3 -3.04560e-04     -4.55e-13
polygon 11 (hole)        3 -4.46108e-04     -6.66e-13
polygon 12 (hole)        5 -2.44408e-04     -3.65e-13
polygon 13 (hole)        5 -3.64686e-02     -5.45e-11
polygon 14              71  8.18750e+03      1.22e-05
polygon 15 (hole)       38 -7.79904e+03     -1.16e-05
polygon 16              91  1.49663e+04      2.24e-05
polygon 17 (hole)      395 -7.38124e+03     -1.10e-05
polygon 18              40  1.38607e+04      2.07e-05
polygon 19 (hole)       11 -8.36705e+01     -1.25e-07
polygon 20 (hole)        3 -2.33435e-03     -3.49e-12
polygon 21              45  2.51218e+03      3.75e-06
polygon 22             139  3.22293e+03      4.81e-06
polygon 23             148  3.10395e+03      4.64e-06
polygon 24 (hole)        4 -1.72650e-04     -2.58e-13
polygon 25              75  1.73526e+04      2.59e-05
polygon 26              83  5.28920e+03      7.90e-06
polygon 27             106  3.04104e+03      4.54e-06
polygon 28              71  5.63061e+03      8.41e-06
polygon 29              10  1.99717e+02      2.98e-07
polygon 30 (hole)        3 -1.37223e-02     -2.05e-11
polygon 31 (hole)        3 -8.68789e-04     -1.30e-12
polygon 32 (hole)        3 -3.39815e-04     -5.08e-13
polygon 33 (hole)        3 -4.52041e-05     -6.75e-14
polygon 34 (hole)        3 -3.90173e-05     -5.83e-14
polygon 35 (hole)        3 -9.59845e-05     -1.43e-13
polygon 36 (hole)        8 -4.28707e-01     -6.40e-10
polygon 37 (hole)        4 -2.18619e-04     -3.27e-13
polygon 38 (hole)        6 -8.37554e-01     -1.25e-09
polygon 39 (hole)        5 -2.92235e-04     -4.36e-13
polygon 40           14053  6.67892e+08      9.98e-01
polygon 41 (hole)        3 -7.43616e-06     -1.11e-14
enclosing rectangle: [2667.54, 55941.94] x [21448.47, 50256.33] units
                     (53270 x 28810 units)
Window area = 669517000 square units
Fraction of frame area: 0.436
plot(unmark(childcareSG_PPP), main = "childcare_SG_PPP")

4.3 Extracting the Study Area

We focus on the childcare centres in the four areas: Punggol, Tampines, Choa Chu Kang and Jurong West.

The code chunk uses filter() to create a new variable for each area and plot() for quick preview.

pg <- mpsz_cl %>%
  filter(PLN_AREA_N == "PUNGGOL")
tm <- mpsz_cl %>%
  filter(PLN_AREA_N == "TAMPINES")
ck <- mpsz_cl %>%
  filter(PLN_AREA_N == "CHOA CHU KANG")
jw <- mpsz_cl %>%
  filter(PLN_AREA_N == "JURONG WEST")
par(mfrow=c(2,2))
plot(st_geometry(pg), main = "Ponggol")
plot(st_geometry(tm), main = "Tampines")
plot(st_geometry(ck), main = "Choa Chu Kang")
plot(st_geometry(jw), main = "Jurong West")

pg_owin = as.owin(pg)
tm_owin = as.owin(tm)
ck_owin = as.owin(ck)
jw_owin = as.owin(jw)

The code chunk below subsets the dataset to the study areas, rescales the unit of measurement from metre to kilometre and finally plot the areas with childcare points.

childcare_pg_ppp = childcare_ppp[pg_owin] #crop childcare points to Punggol
childcare_tm_ppp = childcare_ppp[tm_owin]
childcare_ck_ppp = childcare_ppp[ck_owin]
childcare_jw_ppp = childcare_ppp[jw_owin]
par(mfrow=c(2,2))
plot(unmark(childcare_pg_ppp),
  main = "Punggol")
plot(unmark(childcare_tm_ppp),
  main = "Tampines")
plot(unmark(childcare_ck_ppp),
  main = "Choa Chu Kang")
plot(unmark(childcare_jw_ppp),
  main = "Jurong West")

5 Second-order Spatial Point Patterns Analysis

Let us examine the relationships between points at subzone level.

6 Analysing Spatial Point Process Using G-Function

The G-function Gest() shows distance from an event to its nearest event, with Monte Carlo simulation envelop() used to test against complete spatial randomness.

First we set a fixed random seed to ensure reproducibility.

set.seed(1234)
  1. The code chunk below calculates the G-function. The plot shows childcare centres are clustered within 300 metres but rather random beyond the scale.
G_CK = Gest(childcare_ck_ppp, correction = "border") #correction is border
G_CK
Function value object (class 'fv')
for the function r -> G(r)
......................................................
     Math.label      Description                      
r    r               distance argument r              
theo G[pois](r)      theoretical Poisson G(r)         
rs   hat(G)[bord](r) border corrected estimate of G(r)
......................................................
Default plot formula:  .~r
where "." stands for 'rs', 'theo'
Recommended range of argument r: [0, 228.98]
Available range of argument r: [0, 550.4]
plot(G_CK,xlim= c(0,500)) #xlim zooms the x-axis, CDF rises above csr before 300 and crosses after 300

  1. the code chunk below test Complete Spatial Randomness.

H0 = The distribution of childcare services at Choa Chu Kang are randomly distributed.

H1 = The distribution of childcare services at Choa Chu Kang are not randomly distributed.

The null hypothesis will be rejected is p-value is smaller than alpha value of 0.001.

Since the number of simulations is 999, for two-sided test, nrank = (nsim + 1) * 0.002 /2

nrank is 1 at default. This is also applicable to the following tests.

G_CK.csr <- envelope(childcare_ck_ppp, Gest, nsim= 999)
Generating 999 simulations of CSR  ...
1, 2, 3, ......10.........20.........30.........40.........50.........60..
.......70.........80.........90.........100.........110.........120.........130
.........140.........150.........160.........170.........180.........190........
.200.........210.........220.........230.........240.........250.........260......
...270.........280.........290.........300.........310.........320.........330....
.....340.........350.........360.........370.........380.........390.........400..
.......410.........420.........430.........440.........450.........460.........470
.........480.........490.........500.........510.........520.........530........
.540.........550.........560.........570.........580.........590.........600......
...610.........620.........630.........640.........650.........660.........670....
.....680.........690.........700.........710.........720.........730.........740..
.......750.........760.........770.........780.........790.........800.........810
.........820.........830.........840.........850.........860.........870........
.880.........890.........900.........910.........920.........930.........940......
...950.........960.........970.........980.........990........
999.

Done.
G_CK.csr
Pointwise critical envelopes for G(r)
and observed value for 'childcare_ck_ppp'
Edge correction: "km"
Obtained from 999 simulations of CSR
Alternative: two.sided
Significance level of pointwise Monte Carlo test: 2/1000 = 0.002
.....................................................................
     Math.label     Description                                      
r    r              distance argument r                              
obs  hat(G)[obs](r) observed value of G(r) for data pattern          
theo G[theo](r)     theoretical value of G(r) for CSR                
lo   hat(G)[lo](r)  lower pointwise envelope of G(r) from simulations
hi   hat(G)[hi](r)  upper pointwise envelope of G(r) from simulations
.....................................................................
Default plot formula:  .~r
where "." stands for 'obs', 'theo', 'hi', 'lo'
Columns 'lo' and 'hi' will be plotted as shading (by default)
Recommended range of argument r: [0, 220.38]
Available range of argument r: [0, 550.4]
plot(G_CK.csr, xlim= c(0,500))

The plot of Monte Carlo simulations (999 CSR sims, KM correction, 99.8% confidence) show the observed G in Choa Chu Kang is above the band of 220m, which implies clustering at short distances.

  1. The code chunk below calculates the G-function.
G_TM = Gest(childcare_tm_ppp, correction = "best") #correction is selected by package
G_TM
Function value object (class 'fv')
for the function r -> G(r)
...................................................................
        Math.label    Description                                  
r       r             distance argument r                          
theo    G[pois](r)    theoretical Poisson G(r)                     
km      hat(G)[km](r) Kaplan-Meier estimate of G(r)                
hazard  hat(h)[km](r) Kaplan-Meier estimate of hazard function h(r)
theohaz h[pois](r)    theoretical Poisson hazard function h(r)     
...................................................................
Default plot formula:  .~r
where "." stands for 'km', 'theo'
Recommended range of argument r: [0, 258.51]
Available range of argument r: [0, 807.07]
plot(G_TM,xlim = c(0,800)) #xlim changes the scale

  1. the code chunk below test Complete Spatial Randomness.

Ho = The distribution of childcare services at Tampines are randomly distributed.

H1 = The distribution of childcare services at Tampines are not randomly distributed.

The null hypothesis will be rejected is p-value is smaller than alpha value of 0.001.

Since the number of simulations is 999, for two-sided test, nrank = (nsim + 1) * 0.002 /2

nrank is 1 at default. This is also applicable to the following tests. *

G_TM.csr <- envelope(childcare_tm_ppp, Gest, Correction = "all", nsim = 999)
Generating 999 simulations of CSR  ...
1, 2, 3, ......10.........20.........30.........40.........50.........60..
.......70.........80.........90.........100.........110.........120.........130
.........140.........150.........160.........170.........180.........190........
.200.........210.........220.........230.........240.........250.........260......
...270.........280.........290.........300.........310.........320.........330....
.....340.........350.........360.........370.........380.........390.........400..
.......410.........420.........430.........440.........450.........460.........470
.........480.........490.........500.........510.........520.........530........
.540.........550.........560.........570.........580.........590.........600......
...610.........620.........630.........640.........650.........660.........670....
.....680.........690.........700.........710.........720.........730.........740..
.......750.........760.........770.........780.........790.........800.........810
.........820.........830.........840.........850.........860.........870........
.880.........890.........900.........910.........920.........930.........940......
...950.........960.........970.........980.........990........
999.

Done.
G_TM.csr
Pointwise critical envelopes for G(r)
and observed value for 'childcare_tm_ppp'
Edge correction: "km"
Obtained from 999 simulations of CSR
Alternative: two.sided
Significance level of pointwise Monte Carlo test: 2/1000 = 0.002
.....................................................................
     Math.label     Description                                      
r    r              distance argument r                              
obs  hat(G)[obs](r) observed value of G(r) for data pattern          
theo G[theo](r)     theoretical value of G(r) for CSR                
lo   hat(G)[lo](r)  lower pointwise envelope of G(r) from simulations
hi   hat(G)[hi](r)  upper pointwise envelope of G(r) from simulations
.....................................................................
Default plot formula:  .~r
where "." stands for 'obs', 'theo', 'hi', 'lo'
Columns 'lo' and 'hi' will be plotted as shading (by default)
Recommended range of argument r: [0, 258.51]
Available range of argument r: [0, 807.07]
plot(G_TM.csr, xlim = c(0,800))

Similarly, the plot of Monte Carlo simulations (999 CSR sims, KM correction, 99.8% confidence) show the observed G in Tampines is above the band of 258m, which implies clustering at short distances.

7 Analysing Spatial Point Process Using F-Function

The F-function Fest() estimates the distribution of empty-space distances, which means the distance from a random location to the nearest event. It tells about the gaps. Additionally Monte Carlo envelopes are used to assess complete spatial randomness.

  1. Computing F-function estimation
F_CK = Fest(childcare_ck_ppp)
F_CK
Function value object (class 'fv')
for the function r -> F(r)
.....................................................................
        Math.label      Description                                  
r       r               distance argument r                          
theo    F[pois](r)      theoretical Poisson F(r)                     
cs      hat(F)[cs](r)   Chiu-Stoyan estimate of F(r)                 
rs      hat(F)[bord](r) border corrected estimate of F(r)            
km      hat(F)[km](r)   Kaplan-Meier estimate of F(r)                
hazard  hat(h)[km](r)   Kaplan-Meier estimate of hazard function h(r)
theohaz h[pois](r)      theoretical Poisson hazard h(r)              
.....................................................................
Default plot formula:  .~r
where "." stands for 'km', 'rs', 'cs', 'theo'
Recommended range of argument r: [0, 304.33]
Available range of argument r: [0, 552.75]
plot(F_CK, xlim= c(0,500))

  1. Performing Complete Spatial Randomness Test
F_CK.csr <- envelope(childcare_ck_ppp, Fest, nsim = 999)
Generating 999 simulations of CSR  ...
1, 2, 3, ......10.........20.........30.........40.........50.........60..
.......70.........80.........90.........100.........110.........120.........130
.........140.........150.........160.........170.........180.........190........
.200.........210.........220.........230.........240.........250.........260......
...270.........280.........290.........300.........310.........320.........330....
.....340.........350.........360.........370.........380.........390.........400..
.......410.........420.........430.........440.........450.........460.........470
.........480.........490.........500.........510.........520.........530........
.540.........550.........560.........570.........580.........590.........600......
...610.........620.........630.........640.........650.........660.........670....
.....680.........690.........700.........710.........720.........730.........740..
.......750.........760.........770.........780.........790.........800.........810
.........820.........830.........840.........850.........860.........870........
.880.........890.........900.........910.........920.........930.........940......
...950.........960.........970.........980.........990........
999.

Done.
F_CK.csr
Pointwise critical envelopes for F(r)
and observed value for 'childcare_ck_ppp'
Edge correction: "km"
Obtained from 999 simulations of CSR
Alternative: two.sided
Significance level of pointwise Monte Carlo test: 2/1000 = 0.002
.....................................................................
     Math.label     Description                                      
r    r              distance argument r                              
obs  hat(F)[obs](r) observed value of F(r) for data pattern          
theo F[theo](r)     theoretical value of F(r) for CSR                
lo   hat(F)[lo](r)  lower pointwise envelope of F(r) from simulations
hi   hat(F)[hi](r)  upper pointwise envelope of F(r) from simulations
.....................................................................
Default plot formula:  .~r
where "." stands for 'obs', 'theo', 'hi', 'lo'
Columns 'lo' and 'hi' will be plotted as shading (by default)
Recommended range of argument r: [0, 304.33]
Available range of argument r: [0, 552.75]
plot(F_CK.csr, xlim=c(0,500))

  1. Computing F-function estimation

  2. Performing Complete Spatial Randomness Test

8 Analysing Spatial Point Process Using K-Function

8.1 Choa Chu Kang planning area

8.2 Tampines planning area

9 Analysing Spatial Point Process Using L-Function

9.1 Choa Chu Kang planning area

9.2 Tampines planning area

10 Reference

Kam, T. S. 2nd Order Spatial Point Patterns Analysis Methods. R for Geospatial Data Science and Analytics. https://r4gdsa.netlify.app/chap05